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1     Introduction 

Hydroelectric  scheduling  is  an  important  planning  problem  at  The  Pacific  Gas  and  Electric  Company. 
Depending  on  hydrological  conditions,  PG&E's  hydroelectric  power  plants  generate  roughly  10-20% 
of  the  system's  annual  demand  for  electric  energy.  Other  energy  sources  include  gas-fired  plants, 
the  Diablo  Canyon  nuclear  plant,  and  imports  from  external  sources;  for  simplicity,  we  collectively 
refer  to  these  as  "thermal"  energy  sources.  Thermal  energy  costing  is  complex,  but  for  the  purposes 
of  the  model  we  describe  in  this  paper,  we  assume  a  nonlinear  convex  thermal  cost  function.  Given  a 
hydro  generation  schedule,  this  function  provides  the  cost  of  operating  the  thermal  system  to  satisfy 
the  remaining  demand  for  energy.  An  important  source  of  the  thermal  cost  function's  nonlinearity  is 
the  different  efficiencies  of  various  thermal  plants.  Hydro  units  are  attractive  because  they  generate 
energy  at  a  very  low  variable  cost  and  permit  flexible  scheduling  since  they  can  quickly  ramp  up  to 
full  power.  However,  due  to  reservoir  and  generation  capacities  and  seasonal  variations  in  natural 
inflow  (via  precipitation  and  snowmelt),  they  cannot  be  operated  at  full  capacity  year-round.  The 
scheduling  of  the  hydroelectric  system  is  further  complicated  because  the  volume  of  future  natural 
inflow  into  the  system's  reservoirs  is  uncertain.  The  objective  of  the  model  we  describe  is  to  operate 
the  hydro-thermal  system  with  minimum  expected  cost  for  a  two  year  planning  horizon.  Restated: 
we  wish  to  operate  the  hydro  system  so  as  to  maximize  expected  savings  from  avoided  thermal 
generation  costs.  While  we  give  an  overview  of  the  hydroelectric  scheduling  model  and  coordination 
(with  the  thermal  system)  algorithm  in  §2,  the  reader  is  referred  to  Jacobs  et  al.  [10]  for  a  more 
detailed  description  of  the  ongoing  project  at  PG&E  as  well  as  for  references  to  other  approaches  to 
hydroelectric  scheduling. 

Solutions  from  hydroelectric  scheduling  models  with  deterministic  natural  inflow  forecasts  can  be 
unsatisfactory.  Hydro  generation  decisions  in  such  models  are  made  under  the  incorrect  assumption 
that  forecast  levels  of  natural  inflow  are  ensured  in  forthcoming  months;  if  mean  or  median  inflow 
values  are  used,  the  resulting  solutions  may  fail  to  hedge  against  dry  inflow  scenarios.  As  a  result, 
in  a  dry  scenario,  inefficient  and  costly  thermal  plants  must  be  brought  on  line  to  satisfy  demand 
for  electricity.  On  the  other  hand,  a  conservative  strategy  derived  from  a  relatively  dry  forecast 
may  result  in  forced  spills  due  to  finite  storage  and  generation  capacities.  These  spills  represent  lost 


potential  energy  and  lost  future  cost  savings.  Stochastic  programming  formulations  allow  natural 
inflow  to  be  modeled  as  random  parameters  with  known  distribution,  but  the  size  of  the  resulting 
mathematical  programs  can  be  formidable.  Decomposition  or  L-shaped  algorithms  [3,13]  provide  an 
attractive  approach  to  such  problems  because  they  take  advantage  of  special  structure. 

The  aim  of  this  paper  is  to  present  an  enhanced  decomposition  algorithm  for  multistage  stochas- 
tic programs  and  to  examine  its  performance  on  a  set  of  hydroelectric  scheduling  problems.  The 
remainder  of  the  paper  is  organized  as  follows.  In  §2  the  hydroelectric  scheduling  module  of  the 
stochastic  hydro-thermal  optimization  problem  is  described;  a  collection  of  test  problems  is  also  de- 
tailed. In  §3  we  briefly  review  Benders  decomposition  algorithm  applied  to  multistage  problems  and 
discuss  valid  cut  generation.  The  empirical  performance  of  several  enhancements  to  the  traditional 
algorithm  is  presented  in  §4.  In  §5  we  compare  run-times  of  the  enhanced  decomposition  algorithm 
and  direct  linear  programming  optimizers;  the  paper  is  summarized  in  §6. 

2     The  Model 

A  hydrological  basin  may  be  viewed  as  a  network  consisting  of  a  number  of  reservoirs  (nodes)  that 
are  interconnected  by  rivers,  canals,  and  spillways  (arcs);  energy  is  generated  as  water  flows  through 
powerhouses.  Given  marginal  values  of  energy,  we  model  an  individual  basin  hydroelectric  scheduling 
problem  as  a  T-stage  stochastic  linear  program  with  recourse  (SLP-T): 

T 
maximize        >       >     p,   ct   xt 

«=i  w,en, 

subject  to     -Bt-iX^  +  Atxf  =  b"' ,     0  <  <«  <  up ,     w<  €  ft, 

SLP-T     for  <=1 T 

where  Bq  =  0. 

The  sample  space  for  stage  t  is  denoted  ft,,  and  a  sample  point  (scenario)  in  ft,  is  denoted  u>t.  A 

stage  t  >  2  scenario,  u>, ,  has  a  unique  stage  t  —  1  ancestor  denoted  a(w,),  and  a  stage  t  <  T  scenario 

has  a  set  of  descendant  scenarios  denoted  A{ut).  At  is  an  mt  x  nt  matrix  and  the  remaining  matrices 

and  vectors  are  dimensioned  to  conform.  A  stage  t  realization,  £<(u;,)  =  (c^1 ,  u^' ,  b"'),  is  a  vector  in 

5fN' ,  where  TV,  =  2nt  +  mt.  We  assume  a  finite  number  of  scenarios  and  a  probability  mass  function 

given  by  P  {£t  =  £t(uit)}  =  pf*  ■    For  notational  convenience,  we  have  created  a  first  stage  sample 


space,  fii  that  is  a  singleton  set  where  £i(u>i)  represents  the  known  state  at  the  time  decisions  are 
made  in  the  first  stage;  clearly,  p"1  has  value  one.  SLP-T  has  J2t=i  m«l^il  structural  constraints 
and  5jt_j  "tl^tl  decision  variables  where  |fit|  denotes  the  cardinality  of  Qt- 

We  may,  nominally,  regard  At  as  the  node-arc  incidence  matrix  for  the  hydrological  network. 
The  actual  form  of  At  is  more  complex  for  several  reasons.  First,  non-network  side  constraints 
must  be  incorporated;  e.g.,  decrees  constrain  the  volume  of  water  in  a  subset  of  the  reservoirs  to 
a  minimum  level.  Second,  subperiod  modeling  is  necessary  to  capture  differences  in  peak  and  off- 
peak  values  of  energy.  Third,  stages  contain  different  numbers  of  time  periods  depending  on  the 
corresponding  level  of  uncertainty  in  natural  inflow.  For  example,  summer  months  are  relatively  dry 
and  multiple  time  periods  are  incorporated  in  a  single  stage;  the  snowmelt  season  in  the  spring,  on 
the  other  hand,  is  a  period  of  greater  uncertainty  and  shorter  time  stages  are  required.  In  addition, 
longer  time  stages  are  typically  used  in  the  later  stages  of  the  model. 

In  SLP-T,  decisions  occur  and  uncertainties  unfold  in  the  following  manner.  The  first  stage 
hydro  generation  and  storage  decisions  are  made  with  distributional  information  on  future  data; 
next,  a  specific  scenario  is  revealed  and  second  stage  decisions  are  made  knowing  this  data,  the  first 
stage  decision,  and  conditional  probability  distributions  on  future  inflows  . .  .The  goal  is  to  operate 
the  hydro  basin  with  maximum  expected  benefit,  in  terms  of  avoided  thermal  generation  cost,  for 
T  time  stages.  The  model  has  essentially  three  arc  types:  energy  generation  arcs,  other  spatial 
water  transport  arcs,  and  "transition-in-time"  arcs.  The  matrix  Bt  contains  arcs  of  the  third  type 
that  are  used  to  equate  the  amount  of  water  left  in  a  reservoir  at  the  end  of  one  stage  with  the 
amount  of  water  in  the  same  reservoir  at  the  beginning  of  the  next  stage.  Generation,  spill,  and 
reservoir  capacities  appear  as  simple  bounds  on  the  three  arc  types.  Initial  reservoir  volumes  are 
contained  in  61;  subsequent  6p  vectors  contain  the  uncertain  natural  inflows.  Energy  generation  arcs 
have  objective  function  coefficients  that  represent  marginal  values  of  energy  in  terms  of  dollars  per 
thousand  acre  foot;  these  values  are  stochastic  and  usually  larger  in  drier  natural  inflow  scenarios. 
Other  arcs  typically  have  objective  function  coefficients  of  zero.  In  general,  the  stochastic  parameters 
exhibit  interstage  dependence.  For  instance,  relatively  large  inflows  at  lower  elevations  in  the  winter 
months  are  often  coupled  with  a  growing  snowpack  at  higher  elevations  which  will,  in  turn,  lead 


to  large  inflows  when  melting  occurs  in  the  spring.  Methods  of  dealing  with  finite  horizon  effects 
include  minimum  final  period  reservoir  storage  requirements  or  a  final  period  future  value  function; 
both  methods  may  involve  scenario  dependent  constraints. 

The  individual  basin  programs  described  above  are  subproblems  of  a  larger  multistage  stochastic 
nonlinear  program.  Hydro  scheduling  decisions  must  be  made  in  a  number  of  basins.  The  only 
constraints  linking  the  different  basins  are  the  load  constraints;  these  require  that  at  each  possible 
point  in  time,  demand  for  energy  be  satisfied.  The  Dantzig- Wolfe  decomposition  principle  may  be 
applied  to  create  a  nonlinear  master  problem  that  contains  proposed  hydro  solutions,  the  demand 
constraints,  and  the  nonlinear  thermal  cost  function.  The  linear  subproblem  separates  into  a  sum  of 
independent  subproblems  by  hydro  basin  of  the  form  of  SLP-T.  The  Dantzig-Wolfe  master  generates 
scenario  dependent  marginal  values  of  energy  for  the  subproblems  and  the  subproblems,  in  turn, 
pass  proposed  hydro  solutions  to  the  master  problem.  We  refer  the  reader  to  Eiselt,  Pederzoli,  and 
Sandbloom  [5]  for  a  discussion  of  nonlinear  Dantzig-Wolfe  decomposition. 


Name 

Size 

Dim 

Deterministic 
Equivalent 

Moke3.9 

169  x  820,  337  x  1713,  673  x  3298 

7 

7237  x  28473 

Ybsf3.9 

319  x  1559,  637  x  3119,  1273  x  6239 

12 

13687  x  53489 

Moke4.45 

57  x  271,  113  x  548,  337  x  1713,  673  x  3298 

7 

35736  x  140656 

Ybsf4.45 

107  x  519,  213  x  1039,  637  x  3119,  1273  x  6239 

12 

68012  x  265836 

Table  1:  Test  Problems 

In  §4,  we  examine  the  performance  of  an  enhanced  decomposition  algorithm  on  a  collection  of 
test  problems  that  are  preliminary  versions  of  individual  basin  hydroelectric  scheduling  problems  as 
described  above.  The  test  problems  are  models  with  different  time  horizons,  stage  definitions,  and 
discretizations  of  natural  inflow  distributions.  The  models  are  based  on  two  of  the  larger  hydrological 
basins  in  the  PG&E  system:  Mokelumne  (Moke)  and  Yuba-Bear-South  Feather  (Ybsf).  In  Table  1, 
"Name"  indicates  the  hydrological  basin,  the  number  of  stages  and  number  of  scenarios;  for  example, 
Moke3.9  is  a  three  stage  model  of  the  Mokelumne  basin  with  |fi3|  =  9.  "Size"  gives  the  row  and 
column  dimensions  of  the  At  matrices  for  each  stage.  The  dimension  of  the  domain  of  the  recourse 
function  is  the  number  of  large  reservoirs  in  a  basin;  this  value  is  denoted  "Dim."  "Deterministic 
Equivalent"  gives  the  row  and  column  dimensions  of  the  SLP-T  formulation. 
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3     Benders  Decomposition  and  Valid  Cut  Generation 

We  may  write  SLP-2,  in  minimization  form,  as  follows: 

minimize      {  c\X\  +  Euh(xi,u)  }  , 

*i  >  o  v   > 

where 


/i(xi,w)  =     minimize       CjXj 

subject  to    .42*2     =  b2  +  5i*i  (2) 

x£     >0. 
Note  the  simple  upper  bounds  are  not  explicit;  when  this  is  the  case,  it  may  be  assumed  that  they 

have  been  included  in  the  structural  constraints. 

Benders  decomposition  for  SLP-2  (see  Van  Slyke  and  Wets  [13])  is  a  resource  directed  decompo- 
sition. A  first  stage  decision  is  passed  to  the  right-hand-sides  of  the  second  stage  recourse  problems 
(2)  which  then  act  optimally  under  each  scenario  w.  Supports  of  the  piecewise  linear  convex  recourse 
function,  £'u,/i(zi>u;)>  called  cuts,  are  derived  from  the  dual  of  (2)  and  are  subsequently  passed  back 
to  the  first  stage  master  problem  and  the  process  repeats.  Under  the  assumption  that  second  stage 
"infeasibilities"  are  modeled  by  a  penalty  function,  a  forward  pass  of  the  algorithm  generates  a  feasi- 
ble decision  and  hence  an  upper  bound  on  the  optimal  objective  function  value.  We  can  also  obtain 
a  lower  bound  on  the  optimal  objective  function  value  via  the  master  program's  objective  function 
value:  the  cuts  collected  so  far  provide  an  outer  linearization  of  the  recourse  function.  The  extension 
of  this  procedure  to  SLP-T  can  be  viewed  as  follows.  SLP-T  is  first  decomposed  into  two  stages: 
stage  1  and  stage  2, . . .  ,T.  After  the  first  stage  problem  passes  resources  to  the  right-hand-sides 
of  the  stage  2  subproblems,  there  are  \^2 1  linear  programs  to  solve.  Each  of  these  linear  programs 
is  solved  via  decomposition  into  two  stages:  stage  2  and  stage  3,...,T,  and  so  on.  This  nested 
method  is  the  "traditional"  nested  Benders  decomposition  approach  to  multistage  stochastic  linear 
programming  (see  Birge  [3]);  a  more  formal  description  is  provided  in  §4.4. 

We  now  provide  conditions  under  which  valid  cuts  can  be  generated;  these  results  prove  useful 
in  developing  some  of  the  enhancements  to  the  traditional  algorithm  described  in  §4.  A  valid  cut  is 
defined  to  be  a  cut  that  lies  below  the  recourse  function.  Lemmas  1  and  2,  state  that  dual  feasible 
vectors  generate  valid  cuts  for  SLP-2  and  SLP-T,  respectively. 


Lemma  1  Consider  SLP-2  with  |Q|  =  K.  If  (x1, . . . ,  irK)  are  dual  feasible  for  the  second  stage 
subproblems  then  these  dual  vectors  generate  a  valid  cut  for  the  first  stage  master  program. 

Proof 

Let  W"  =  {x  :  irA2  <  <%}  denote  the  dual  feasible  region  of  (2).  By  hypothesis,  tt"  G  Tiw  V  u>  G  fi; 
thus 

x"  (6£  +  BiXi)  <   h(xitu)  =     m^imi5e     *"  (b%  +  Bm)      Vxj. 

By  taking  expectations  we  see  Gx\  +  g  is  a  valid  cut  where  G  =  Ewfrw B\  and  g  =  JE?t„fru'6^.  ■ 

With  the  exception  of  the  final  stage,  the  difference  in  the  multistage  setting  is  that  subproblems 
generating  cuts  for  their  ancestors  contain  their  own  cuts.  Lemma  1  implies  dual  feasible  vectors  to 
the  stage  T  subproblems  generate  valid  cuts  for  their  stage  T—  1  ancestors,  but  a  new  result  is  needed 
for  the  general  case.  Lemma  2  ensures  that  if  the  stage  t  (2  <  t  <  T)  subproblems  contain  valid 
cuts  then  they  will,  given  dual  feasible  vectors,  generate  valid  cuts  for  their  stage  t  —  1  ancestors. 
The  stage  t  (1  <  t  <  T)  subproblem  under  scenario  w(,  denoted  sub(u>,),  has  the  following  form: 

minimize  c^'x"'       -f  0J"' 

subject  to  Atxf'  =  b^'  4-  Bt_\Xt_\ 

sub(ut)  -Ofx?     +  e  6?     >  gf 

*T  >  o. 

The  rows  of  the  matrix  Gf  contain  cut  gradients;  the  elements  of  the  vector  gf'  are  cut  intercepts; 
and  e  denotes  the  vector  of  all  l's.  As  the  algorithm  proceeds,  the  row  dimension  of  these  quantities 
will  grow. 

Lemma  2  Suppose  \A(uit)\  =  K ,  1  <  t  <  T  —  2,  and  the  descendants  of  sub(uit)  contain  valid  cuts. 
If  [(*t+ii^«+i)>  •  •  •»(*H-ii<*t+i)]  art  dual  feasible  vectors  for  the  descendants  of  sub(u>t)  then  these 
dual  vectors  generate  a  valid  cut  for  sub(ut). 

Proof 

Denote  the  value  of  subproblem  (3)  by  ^t-i(X|-i,Wt»G!t,'>S't'1)-  Let  (G^l\l  ,^+t')  be  a  set  of  valid 
cuts  and  define  ft(xt,u>t+i)  =  ■77<(*t.u'i+i  .G^+i'  '(t'tlV)-  For  each  ut+\  G  A(w<)  there  exists  a 
finite  set  of  cuts,  denoted  by  ( G^t ' '  ffr+t ' ) -  sucn  that  the  conditional  stage  *  +  2  recourse  function 
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is  completely  specified;  let  ht(xt,ut+i)  =  !Ft(xt,u)t+\  ,&t+\  ,  (j't+V  )■  Now  suppose  (Jr^*1  ,  o^-i-V  )  *s 
dual  feasible  for  the  program  corresponding  to  ft{xt,ut+i).  Then 

KIV(K+V+Btxt)  +  *%V9?IV  </t(*t,<*+i)<M*t."t+i)    V*,.  (4) 

The  left  hand  inequality  is  immediate  by  writing  the  program  corresponding  to  ft(xt ,w<+i)  in  its 
dual  form.  The  right  hand  inequality  follows  as  the  cut  constraints  in  ft(xt,u)t+i)  are  dominated  by 
the  cuts  of  ht(xt,u;t+\).  The  desired  result  is  obtained  by  multiplying  (4)  by  pfl\  '  and  summing 
over  all  «t+i6  A(u>t).  ■ 

4     The  Enhanced  Decomposition  Algorithm 

In  this  section,  four  enhancements  to  the  traditional  nested  Benders  algorithm  are  presented:  Warm 
start  techniques  obtain  "good"  initial  basic  feasible  solutions  for  subproblems  based  on  optimal  basis 
information  from  previous  subproblem  solutions.  Advanced  start  procedures  generate  preliminary 
cuts  prior  to  initiating  a  formal  Benders  algorithm.  The  multicut  Benders  decomposition  algorithm 
returns  one  cut  for  each  descendant  of  a  particular  subproblem  instead  of  a  single  aggregate  cut. 
Tree  traversing  strategies  prescribe  the  order  in  which  subproblems  of  the  decision  tree  are  solved. 

The  enhanced  algorithm  is  implemented  in  FORTRAN  77,  and  the  computational  results  we 
present  are  from  a  Hewlett  Packard  9000/750  workstation.  The  algorithm  uses  NETSIDE,  a  special 
purpose  code  that  solves  the  minimum  cost  flow  network  problem  with  side  constraints,  developed 
by  Kennington  and  Farhangian  as  the  subproblem  solver.  NETSIDE  is  based  on  a  specialization  of 
the  primal  simplex  method  (see  Kennington  and  Helgason  [11]  and  Barr  et  al.  [2]);  in  our  setting, 
the  set  of  side  constraints  includes  Benders  cuts. 

The  performance  of  a  particular  algorithmic  enhancement  will  be  analyzed  with  respect  to  a  base 
case  strategy  which  is  the  strategy  we  recommend.  Thus,  we  evaluate  the  marginal  effect  of  each 
enhancement.  The  base  case  strategy  and  its  performance  on  the  four  test  problems  is  summarized 
in  Tables  2  and  3;  the  details  are  presented  in  the  respective  subsections  that  follow.  All  problems 
are  solved  to  within  an  objective  function  tolerance  of  0.01%.  Reported  CPU  times  exclude  input 
and  output  operations.  The  #  subproblems  column  of  Table  3  gives  the  number  of  subproblems 
solved  in  each  stage  during  the  course  of  the  algorithm. 


Warm  Start 

Advanced  Start 

Multtcut 

Tree  Traversing  Strategy 


Candidate  list  of  length  20  until  relative  error  <  5% 
Prespecified  Decisions  Method  with  shared  cuts 
Yes 
Fastpass 


Table  2:  Base  Case  Strategy 


Name 

#  subproblems 

#  iterations 

CPU  (sec.) 

Moke3.9 

10-59-99 

10 

55.4 

Ybsf3.9 

7-47-72 

7 

119.8 

Moke4.45 

10-64-274-459 

10 

221.4 

Ybsf4.45 

8-83-205-369 

8 

404.0 

Average 

N/A 

8.8 

200.2 

Table  3:  Base  Case  Strategy  Performance 


4.1      Warm  Start 

Similar  subproblems  are  repeatedly  solved  during  the  course  of  a  decomposition  procedure.  Tech- 
niques that  take  advantage  of  optimal  basis  information  from  previous  subproblem  solutions  are 
critical  for  developing  efficient  algorithms.  Wets  [14]  and  Garstka  and  Rutenberg  [6]  have  proposed 
bunching  and  sifting  techniques,  respectively  for  solving  a  large  number  of  similar  linear  programs. 
The  sifting  method,  however,  requires  component-wise  independence  of  the  right-hand-side  and  de- 
terministic objective  function  coefficients;  our  test  problems  violate  these  requirements.  Moreover, 
in  our  experiments  there  were  a  low  number  of  "repeat"  optimal  bases  which  seemed  to  indicate 
bunching  might  not  occur.  While  the  primary  computational  challenge  in  the  two-stage  problem 
lies  in  the  solution  of  a  large  number  of  similar  second  stage  problems  each  iteration,  the  greatest 
potential  for  computational  savings  in  a  multistage  problem,  with  only  a  few  stochastic  branches 
each  stage,  rests  in  an  ability  to  select  good  initial  bases  for  each  subproblem. 

We  propose  a  warm  start  technique  in  which  initial  bases  are  selected  from  a  candidate  list  until 
the  relative  error  is  sufficiently  small.  In  subsequent  iterations,  subproblems  are  initialized  with 
the  basis  from  their  previous  optimal  solution.  The  columns  of  a  basic  solution  of  subproblem  (3) 
can  be  partitioned  into  a  network  component  and  a  side  constraint  component;  see  Kennington  and 
Helgason  [11].    The  heuristic  used  to  select  the  best  basis  from  the  candidate  list  for  a  particular 


subproblem  proceeds  as  follows: 

(i)  Calculate  network  flows  from  the  network  component  of  each  candidate  list  basis. 

(ii)  Calculate  solutions  for  the  entire  subproblem  based  on  each  network  flow  solution. 

(iii)  Determine  the  objective  function  value  of  each  solution. 

(iv)  Select  the  candidate  list  basis  that  has  the  minimum  corresponding  objective  function  value. 

Step  (i)  can  be  performed  quickly  due  to  the  tree  structure  of  the  network  basis.  In  step  (ii) 
we  substitute  the  solution  from  (i)  into  the  side  constraints  and  generate  a  feasible  solution  from 
slack,  surplus,  artificial,  and  future  cost  variables.  The  "best"  basis  is  then  determined  from  the 
objective  function  value  of  each  solution.  Note  that  the  value  calculated  in  (iii)  is  only  an  estimate 
of  the  actual  objective  function  value  that  the  basis  will  yield  since  the  side  constraint  component  of 
the  basis  is  not  considered.  We  ignore  this  component  due  to  the  computational  effort  required  for 
refactorization  and  the  fact  that  the  network  constitutes  most  of  the  subproblem.  Minimal  storage 
is  required  for  each  basis:  arc  indices  within  a  pointer  structure,  a  list  of  upper  bound  arc  indices 
for  the  network,  and  a  list  of  column  indices  for  the  side  constraints.  See  Jacobs  [9]  for  the  details  of 
the  NETSIDE  basis  insertion  procedure.  Warm  start  parameters  that  the  user  must  select  are  the 
maximum  size  of  the  candidate  lists  and  the  relative  error  at  which  the  method  switches  from  the 
candidate  list  heuristic  to  simply  reusing  the  previous  optimal  basis  for  each  subproblem;  reasonable 
values  for  these  parameters  are  suggested  below  in  the  computational  results  discussion. 

Computational  Results 

Columns  2-4  of  Table  4  detail  the  performance  of  the  algorithm  when  each  subproblem  solution 
begins  with  an  all-artificial  basis.  Similarly,  columns  5-7  show  the  empirical  results  of  the  strategy 
in  which  the  candidate  list  heuristic  is  not  used  and  we  simply  recall  the  previous  optimal  basis  for 
each  subproblem;  if  such  a  basis  does  not  exist  (because  a  particular  subproblem  has  not  yet  been 
solved)  then  the  optimal  basis  of  another  subproblem  from  the  same  stage  is  used.  The  "x  increase" 
column  is  defined  as  T/T(,c  and  the  "%  increase"  column  as  (T  —  T\,c)jT\,c  •  100.  Tt,c  denotes  the 
running  time  of  the  base  case  strategy  and  T  the  modified  strategy;  e.g.,  the  base  case  with  no  warm 
start.  If  the  %  increase  column  is  20  then  it  is  to  be  read:  "the  current  strategy's  running  time  is 
20%  longer  than  the  running  time  of  the  base  case  strategy."   Table  4  reveals  the  dramatic  impact 


of  warm  starts  and  also  indicates  that  substantial  computational  savings  can  result  from  using  the 
heuristic  to  select  good  bases  early  in  the  algorithm.  The  values  of  5%  for  the  switch  over  tolerance 
and  20  for  the  candidate  list  length  (see  Table  2)  were  determined  by  varying  these  parameters  on 
a  wider  variety  of  test  problems  using  the  base  case  strategy  with  and  without  an  advanced  start. 


Name 

No  Warm  Start 

Recall  Previous  Basis 

iter. 

CPU  (sec.) 

x  increase 

iter. 

CPU  (sec.) 

%  increase 

Moke3.9 

13 

909.1 

16.4 

12 

75.0 

35.4 

Ybsf3.9 

8 

1973.3 

16.5 

8 

163.4 

36.4 

Moke4.45 

11 

3793.4 

17.1 

11 

263.4 

19.0 

Ybsf4.45 

11 

14267.8 

35.3 

9 

526.1 

30.2 

Average 

10.75 

5253.9 

21.3 

10 

257.0 

30.3 

Table  4:  Warm  Start  Procedures 

4.2      Advanced  Start 

A  Benders  decomposition  algorithm  initiated  with  no  preliminary  cuts  generates  myopic  first  it- 
eration decisions  and  "extreme"  decisions  in  the  early  iterations.  The  idea  behind  an  advanced 
start  method  is  to  calculate  preliminary  cuts  to  help  guide  the  early  iterations  of  the  algorithm.  A 
technique  utilized  by  Infanger  [8]  involves  first  solving,  by  Benders  decomposition,  the  much  smaller 
expected  value  problem  in  which  the  stochastic  parameters  of  SLP-T  are  replaced  by  their  population 
means.  The  cuts  produced  in  the  process  are  valid  for  SLP-T  if  the  stochastic  parameters  exhibit 
interstage  independence  and  appear  only  in  Bt  and  bt;  this  claim  is  easily  verified  via  Lemmas  1 
and  2.  However,  in  the  stochastic  hydroelectric  scheduling  problems  the  objective  function  coeffi- 
cients are  random  and  the  stochastic  parameters  are  temporally  correlated;  thus  the  expected  value 
method  is  not  applicable  and  we  seek  an  alternate  approach. 

A  "prespecified  decisions"  method  for  computing  preliminary  cuts  requires,  for  each  stage  t  <  T, 
a  collection  of  decision  vectors:  <  x\  :  i  =  1 , . . . ,  Nt  > .  The  basic  idea  is  to  generate  preliminary  cuts 
by  solving  subproblems  with  right-hand-sides  of  the  form:  Bf_ixJ_j  +bf*.  A  naive  implementation 
involves  solving  the  descendants  of  each  scenario  tree  node  at  each  of  the  prespecified  decisions  and 
computing  the  corresponding  cuts.  In  the  streamlined  approach  described  in  this  section,  we  select 
a  single  scenario  ut  on  each  stage  and  solve  the  descendants  of  scenario  Wj_i  and  then  pass  a  cut 
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back  not  only  to  sub(u>f_i)  but  also  to  all  its  stage  t  —  1  neighbors  via  the  dual  sharing  formula 
described  below;  Figure  1  summarizes  the  method.  In  a  symmetric  four  stage,  45  scenario  problem 
with  1,  3,  15,  and  45  scenarios  on  each  stage  and  Nt  =  3,  the  number  of  subproblems  solved  in  the 
streamlined  method  is  and  33  while  the  naive  method  is  189. 

define  ut ,  t  =  1, . . . ,  T  —  1 

do  t  —  T  downto  2 
do  i  =  1  to  Nt-x 
do  u>t  €  A(u>t-i) 

form  RHS  of  sub(u>«):  Bt.1x(ti}1  +b?' 
solve  sub(u>f) 
enddo 

pass  cut  to  sub(u;t_i)  V  w«_i  €  fi«-i 
enddo 
enddo 

Figure  1:  Prespecified  Decisions  Method  -  Sharing  Duals 

The  only  relevant  components  of  the  prespecified  decision  vectors  are  ones  that  contribute  to 
the  product  Bt^iit_1;  this  corresponds  to  end-of-stage  reservoir  storage  volumes  in  the  hydro 
scheduling  problems.  Reservoirs  have  natural  lower  and  upper  storage  bounds,  and  in  the  absence 
of  additional  information  (e.g.,  historical  storage  levels  or  values  from  prior  optimizations),  the 
prespecified  decisions  are  defined  as  percentiles  between  the  upper  and  lower  bounds.  In  particular, 
we  use  three  prespecified  vectors  at  20%,  50%,  and  80%,  and  we  define  u>t  —  \Kt/2~\  where  Qt  — 
{l,...,A't}  and  where  the  ceiling  function  [•]  gives  the  smallest  integer  greater  than  or  equal  to 
its  argument.  The  order  of  cut  computation  is  relevant;  for  example,  in  a  three  stage  problem  all 
preliminary  cuts  are  passed  to  the  second  stage  prior  to  passing  any  cuts  to  the  first  stage.  In  this 
way,  the  maximum  possible  amount  of  information  is  subsequently  passed  to  the  first  stage. 

The  Dual  Sharing  Formula 

The  dual  of  sub(u>t)  (see  program  (3))  with  explicit  simple  bounds  may  be  written: 

maximize     <•  (b?  +  Bt-iX&'A  +  o^tf*  "  rf'uV 
subject  to     <Mr  -  a^GJ"  -  ff  <  cj" 

eTaf  =  1 

(*?  >  0,   fit*  >  0. 
In  describing  the  dual  sharing  formula,  super  and  subscripts  are  suppressed  for  clarity.    Suppose 

we  have  solved  a  stage  t  subproblem  with  data  (c,G,A)  and  obtained  optimal  dual  prices  (7r,d,/i). 
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Given  another  stage  /  subproblem  with  data  (c,G,A)  feasible  dual  prices  can  be  generated  directly 
from  (ir,  a,  ft)  via 

(v,Q,fi)  -  \v,a,[*A  -aG-c]+ J  .  (5) 

Dual  feasibility  of  (w,a,fi)  is  easily  verified  by  substitution.  The  [v]+  notation  means  take  the 
positive  part  of  the  vector  v,  component-wise.  We  refer  to  (5)  as  the  dual  sharing  formula.  Note  (5) 
is  also  valid  for  stage  T  subproblems  when  the  (vacuous)  cut  gradient  matrix  and  associated  dual 
variables  are  dropped. 

Suppose  we  have  solved  the  descendants  of  sub(u;();  using  the  corresponding  optimal  dual  vari- 
ables, the  dual  sharing  formula  may  be  applied  to  compute  a  valid  cut  for  sub(u>J).  To  compute  this 
cut,  we  match  elements  of  A(w()  and  A(wJ).  (Another  possibility  is  to  select  the  dual  vectors  that 
produce  the  strongest  cut  at  a  particular  stage  t  decision.)  By  Lemmas  1  and  2  these  feasible  dual 
vectors  generate  valid  cuts.  Note  that  a  cut  generated  by  applying  (5)  can  be  weak;  the  extreme  case 
is  a  positive  price  on  an  infinite  simple  bound.  In  the  hydroelectric  scheduling  problems,  however, 
the  only  arcs  that  can  have  nonzero  shared  ^z's  have  natural  finite  bounds. 

Computational  Results 

Columns  2-4  of  Table  5  detail  the  performance  of  the  base  case  strategy  with  no  advanced  start  and 
columns  5-7  show  the  performance  when  we  use  the  naive  advanced  start  without  the  dual  sharing 
formula.  In  selecting  an  advanced  start  procedure,  one  must  balance  the  computational  benefit 
that  the  preliminary  cuts  yield  with  the  cost  of  generating  the  cuts.  Table  5  shows  the  base  case 
strategy  of  solving  only  a  subset  of  subproblems  on  each  stage  and  utilizing  the  dual  sharing  formula 
provides  an  attractive  advanced  start.  The  average  relative  error  after  the  first  iteration  for  the 
base  case  strategy  and  advanced  start  strategy  with  no  sharing  is  6.8%  and  6.1%,  respectively,  and 
the  corresponding  average  CPU  times  for  the  first  iteration  are  67.6  sec.  and  117.8  sec.  Thus  the 
slower  method  produces  slightly  stronger  cuts,  but  the  empirical  results  indicate  the  computational 
expense  is  too  high.  The  table  reveals,  however,  the  naive  advanced  start  procedure  is  preferable 
to  none  at  all,  and  that  the  advanced  start  procedures  provide  greater  computational  savings  in  the 
four  stage  problems  than  in  the  three  stage  problems. 
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Name 

No  Advanced  Start 

Naive  Advanced  Start 

iter. 

CPU  (sec.) 

%  increase 

iter. 

CPU  (sec.) 

%  increase 

Moke3.9 

15 

68.4 

23.5 

10 

60.0 

8.3 

Ybsf3.9 

9 

118.5 

-1.1 

7 

129.1 

7.8 

Moke4.45 

13 

318.9 

44.0 

9 

255.3 

15.3 

Ybsf4.45 

13 

614.7 

52.2 

9 

553.1 

36.9 

Average 

12.5 

280.1 

29.7 

8.8 

249.4 

17.1 

Table  5:  Advanced  Start  Procedures 

4.3     Multicut  Algorithms 

Birge  and  Louveaux  [4]  introduced  the  multicut  L-shaped  algorithm  for  SLP-2  (1).  The  traditional 
aggregate  cut  algorithm  creates  a  master  program  by  replacing  EwenP^M1!^)  m  program  (1)  by 
0  and  sequentially  adding  cuts  of  the  form  6  >  (£3u/€nPu,7ru;-Si)  x\  +  22u,€nPu'7rU'^2  eac^  iteration. 
The  multicut  version  instead  replaces  h(xi,u)  by  0"  for  all  w  G  ^  and  each  iteration  appends  |Q| 
cuts  of  the  form  0W  >  (irwB\)xi  +  it^b^-  When  compared  with  the  aggregate  cut  algorithm,  the 
multicut  algorithm  has  the  disadvantage  of  requiring  more  decision  variables;  similarly,  after  a  given 
number  of  iterations,  it  also  maintains  a  larger  number  of  cut  constraints  in  the  master  program. 
This,  however,  is  countered  by  the  advantage  of  increased  resolution  of  the  recourse  function.  In 
practice  we  expect  multicut  algorithms  will  typically  take  fewer  iterations  than  their  aggregate  cut 
counterparts.  (Birge  and  Louveaux  [4],  however,  present  a  counter  example  showing  this  is  not  always 
the  case.)  The  multicut  algorithm  extends  to  the  multistage  setting  in  a  straightforward  fashion. 
Each  descendant  scenario  passes  back  a  cut  to  its  ancestor  and  the  ancestor  objective  function  has 
the  form: 

Other  generalizations  of  the  multicut  algorithm  are  possible;  the  descendants  can  be  partitioned 
into  disjoint  sets  and  a  "6"  defined  for  each  set.  In  the  multicut  algorithm  described  above,  each  set 
of  the  partition  is  a  singleton,  and  the  aggregate  cut  algorithm  is  the  special  case  where  the  only 
set  of  the  partition  is  A(u>t)  itself.  A  coarse  partition  version  of  the  multicut  algorithm  should  be 
particularly  attractive  when  the  number  of  scenarios  is  large.  The  other  enhancements  discussed  in 
this  paper  can  run  in  either  aggregate  cut  or  multicut  mode. 
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Computational  Results 


Name 

#  iterations 

CPU  (sec.) 

%  increase 

Moke3.9 

22 

107.3 

93.7 

Ybsf3.9 

13 

146.3 

22.1 

Moke4.45 

18 

318.1 

43.7 

Ybsf4.45 

17 

662.0 

63.9 

Average 

17.5 

308.4 

55.9 

Table  6:  Single  Cut 

Table  6  details  the  performance  of  the  strategy  in  which  the  multicut  method  is  replaced  by  the 
single  cut  procedure  in  the  base  case  strategy.  The  average  number  of  iterations  in  the  multicut 
procedure  is  about  half  that  of  the  single  cut  method.  However,  due  to  the  quality  of  the  warm 
start  procedure  the  corresponding  running  times  are  not  halved.  Nevertheless,  Table  6  shows  the 
multicut  method  yields  a  significant  computational  advantage.  The  relative  error  as  a  function  of 
CPU  time  is  plotted  in  Figure  2  for  three  strategies  on  test  problem  Moke4.45.  Note  (i)  the  faster 
convergence  of  the  multicut  algorithm,  (ii)  the  additional  computational  effort  but  improved  initial 
relative  error  of  the  advanced  start  procedure,  and  (iii)  the  effect  of  the  warm  start  on  the  time  per 
iteration  as  the  algorithm  proceeds. 
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Figure  2:  Relative  error  vs.  CPU  time  -  Moke4.45 


14 


4.4     Tree  Traversing  Strategies 

For  brevity,  we  refer  to  the  nested  Benders  decomposition  algorithm  described  in  §3  as  the  shuffle  tree 
traversing  strategy.  Abrahamson  [1]  and  Wittrock  [15]  developed  strategies  other  than  shuffle  for 
deterministic  multistage  linear  programs;  we  consider  two  of  these:  fasipass  and  cautious.  Gassmann 
[7]  tested  shuffle,  fasipass,  and  cautious  in  the  stochastic  setting.  Abrahamson,  Wittrock,  and 
Gassmann  found  fasipass  to  be  an  effective  strategy.  In  addition  to  these  three  strategies,  we 
present  two  new  classes  of  tree  traversing  strategies. 

The  two  extreme  strategies  are  shuffle  and  cautious.  Shuffle  only  goes  backward  up  the  tree  if  it 
cannot  go  forward;  i.e.,  it  solves  all  the  stage  t+1,...  ,T  subproblems  explicitly  (within  a  tolerance) 
prior  to  passing  cuts  back  to  stage  t.  On  the  other  hand,  cautious  never  goes  forward  down  the  tree 
unless  all  cuts  that  would  be  passed  back  to  stage  t  —  1  are  redundant.  Fasipass  is  a  strategy  "half- 
way" between  shuffle  and  cautious.  We  introduce  the  e-shuffle  and  c-cautious  strategies:  e-shuffle  is 
a  strategy  that  is  in  between  shuffle  and  fasipass;  it  is  less  hesitant  to  go  backward  up  the  tree  than 
the  former  but  more  hesitant  than  the  latter;  (-cautious  is  similarly  in  between  cautious  and  fasipass. 
As  e  increases  both  f-strategies  become  more  like  fasipass.  As  e  shrinks,  the  two  f-strategies  more 
closely  mimic  their  (non  e)  counterparts. 

The  primary  concern  with  shuffle  is  it  may  spend  too  long  solving,  for  example,  the  stage  2, . . . ,  T 
subproblems  with  respect  to  a  poor  first  stage  decision.  The  quality  of  the  information  passed  up 
the  tree  is  high  (the  cuts  are  supports  of  the  recourse  function),  but  too  much  effort  may  be  spent 
computing  the  cuts.  Cautious,  on  the  other  hand,  may  spend  too  much  effort  generating  stage  t  —  1 
cuts  when  the  stage  t  cuts  do  not  yet  give  a  good  approximation  of  the  expected  costs  to  be  incurred 
in  stages  t+ 1, . . . ,  T.  The  "best"  tree  traversing  strategy  will  properly  balance  the  quality  of  the  cuts 
(and  hence  the  lower  bound)  with  the  computational  effort  required  to  generate  them.  The  search 
for  this  balance  motivates  considering  strategies  that  range  between  shuffle  and  cautious.  Fasipass 
is  one  such  strategy;  the  e-strategies  enable  us  to  more  fully  investigate  intermediate  strategies. 

Tree  Traversing  Theorem 

In  shuffle,  a  subproblem  only  receives  a  cut  from  its  descendants  after  each  descendant  subproblem  is 
solved  with  respect  to  its  descendants.  In  this  context,  solved  means  the  upper  and  lower  bounds  on 
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the  optimal  objective  function  value  coincide.  The  tree  traversing  theorem  states  that  a  cut  passed 
back  to  a  subproblem  "prematurely"  (i.e.,  while  the  gaps  in  descendant  objective  function  bounds 
are  still  positive)  is  a  valid  cut. 

Theorem  3  A  cut  passed  back  to  sub(u>t)  when  the  subproblems,  sub(u)tJr\),  ujt+\  £  A(u>i),  are  not 
solved  with  respect  to  their  descendants  (due  to  insufficient  cuts)  is  a  valid  cut. 

Proof 

This  result  follows  directly  from  Lemma  1  and  Lemma  2:  The  subproblems  contain  valid  cuts,  and 
the  dual  variables  associated  with  optimal  solutions  to  the  subproblems  are  dual  feasible.  ■ 

In  light  of  Theorem  3,  one  has  a  great  deal  of  freedom  in  designing  algorithms  with  respect  to 
the  order  in  which  the  subproblems  of  the  decision  tree  are  solved.  Our  goal  is  to  devise  strategies 
that  allow  us  to  solve  large-scale  multistage  stochastic  linear  programs  as  quickly  as  possible. 

Formal  Strategy  Definition 

Absolute  error  and  discrepancy  are  two  useful  concepts  in  formally  defining  the  tree  traversing 
strategies.  The  absolute  error,  At{x^' , ut),  for  the  stage  t  subproblem  under  scenario  u>t,  i.e.,  sub(u>j), 
is 

[i=t+l  Wi€A-«(u>i)  J 

This  expression  is  the  difference  between  upper  and  lower  bounds  on  the  optimal  objective  function 

value  for  sub(u)t);  note  that  it  depends  on  sub(u;()'s  right-hand-side  and  hence  its  ancestor's  de- 
cisions. The  absolute  error  for  the  stage  T  subproblems  is  zero  because  these  linear  programs  are 
solved  directly.  The  A  notation  (see  §2)  may  also  be  applied  to  a  set:  A(5)  =  (Jj€<s  ^(5)'  ^"  means 
apply  A  n  times,  e.g.,  A2(u><)  =  A(A(u>()). 

Scott  [12]  defined  discrepancy,  T>t,  in  the  deterministic  case.  We  extend  the  notion  of  discrepancy 
to  the  stochastic  setting. 

is  the  discrepancy  for  sub(u;t).  The  second  term  in  the  discrepancy,  Of  ,  represents  sub(u>t)'s  estimate 
of  the  expected  cost  to  be  incurred  by  its  descendants  in  stages  t + 1, . . . ,  T..  The  first  term  represents 
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the  conditional  expected  value  of  the  cost  realized  in  stage  t  +  1  plus  the  stage  t  +  1  estimate  of  the 
expected  cost  in  stages  t  +  2, ...,T.  The  discrepancy  for  any  stage  T  subproblem  is  zero. 

We  now  define  two  subroutines:  forward(u,v)  and  backward(v,u).  The  former  sweeps  forward 
from  stage  u  to  stage  v.  It  first  forms  the  right-hand-sides  of  the  appropriate  stage  u  subproblems 
and  solves  them,  and  then  it  forms  the  right-hand-sides  of  the  appropriate  stage  u  + 1  subproblems 
and  solves  them  . . .  until  it  has  solved  all  the  appropriate  stage  v  subproblems.  On  the  other  hand, 
backward(v,u)  first  passes  cuts  to  the  appropriate  stage  v  subproblems  and  solves  them,  and  then 
passes  cuts  back  to  the  appropriate  stage  v  —  1  subproblems  and  solves  them  . . .  until  it  has  passed 
cuts  back  to  the  appropriate  stage  u  subproblems  (it  does  not  solve  them).  In  the  execution  of 
forward(u,  v)  and  backward(v,  u),  it  is  not  always  necessary  to  solve  every  subproblem  that  we  might 
address.  For  example,  in  subroutine  forward(u,  v),  some  of  the  stage  u  subproblems  (and  therefore  its 
descendants)  might  have  a  sufficiently  small  absolute  error,  and  in  subroutine  backward(v,u),  some 
of  the  stage  v  subproblems  (and  possibly  its  ancestors)  might  have  a  sufficiently  small  discrepancy. 
For  declaring  specific  subproblems  temporarily  "solved"  in  this  fashion,  we  use  discrepancies  in  the 
cautious  strategies  and  absolute  errors  in  the  shuffle  and  fastpass  strategies. 

Because  the  Benders  cuts  form  an  outer  linearization  of  the  recourse  function,  T>t(x"'  ,u>t)  >  0. 
Furthermore,  Vt(x^'  ,u>t)  =  0  implies  the  cut  that  would  be  passed  to  sub(u>t)  is  redundant.  Two 
more  useful  facts  relating  absolute  error  and  discrepancy  are: 

EWtVt(x?\ut)  =  E„,Mx?\u,t)-E^1At+l(x'?;+1\u,t+1)  (6) 

T-\ 

Al(x"1\u1)=J2E«,Vt(x?',"t)-  (7) 

When  .4i(£i,u>i)  <  toler  -rm.n(\U\,  |£|),  SLP-T  is  declared  to  be  solved  where  toler  is  a  prespecified 
tolerance.  U  and  L  denote  the  upper  and  lower  bounds  on  the  optimal  objective  function  value 
that  the  decomposition  algorithm  continually  updates.  The  definitions  of  sufficiently  small  expected 
absolute  error  and  sufficiently  small  expected  discrepancy  used  in  the  tree  traversing  strategies  are 
motivated  by  (6)  and  (7).  The  cautious  and  (-cautious  strategies  begin  with  one  iteration  of  fastpass 
so  initial  upper  and  lower  bounds  on  the  optimal  objective  function  value  may  be  defined. 
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(1)  fastpass 

step  0  define  toler;  set  iter  =  1 

step  1  forward(l,T) 

step  2  if  A\  (xi ,  u>i)  <  toler  ■  min(\U\,  \L\)  then  stop:  optimal  solution  at  hand 

step  3  backward(T  —  1,1);  iter  =  iter  +  1 

step  4  go  to  step  1 

(2)  shuffle 

step  0  define  toler;  set  iter  =  1,  t  =  1 

step  1  forward(t,T) 

step  2  if  ,4i  (zj  ,u>i)  <  to/er  •  mtn(|{/|,  \L\)  then  stop:  optimal  solution  at  hand 

step  3  <  =  moi{i   :   E^^iix"' ,u>,)  >  IfP^ -toler  ■  mm{\U\,\L\)) 

step  4  backward(t,  t) 

step  5  if  t  =  1  then  iter  =  iter  +  1 

step  6  go  to  step  1 

(3)  cautious 

step  0     define  toler;  set  iter  =  1,  tj  =  1,  t2  =  2 

step  1     apply  one  iteration  of  the  fastpass  algorithm;  iter  =  iter  +  1 

step  2     forward(ti,t2) 

step  3     if  <2  =  T  then 

iter  =  iter  +  1; 

if  «4i(zi,wi)  <  toler  ■  mtn(\U\,  \L\)  then  stop:  optimal  solution  at  hand 

step  4     if  Euta_1  Vta-i(x"*i~l ,ut3-i )  <  ^  ■min(|I7|,|I|)  then 

<i  =  <2  +  l;  <2  =  ti 
else 

backward(t2  —  1,1); 

*i  =  1;  *2  =  2 

step  5     go  to  step  2 

(4)  t-shuffle 

step  0  define  toler,  e;  set  »ter  =  1,  t  =  1 

step  1  forward(t,T) 

step  2  if  A\ {x\ , ui )  <  <o/er  ■  min(|£/|,  |Z,|)  then  stop:  optimal  solution  at  hand 

step  3  t  =  max{max{i   :   EUlA,{x^' ,  w,)  >  ffy  •  c  •  min(|t/|,  |I|)}  ,  1 } 

step  4  backward(T  —  1,  <) 

step  5  if  t  =  1  then  »<er  =  iter  +  1 

step  6  go  to  step  1 

(5)  (-cautious 

step  0     define  toler,  c;  set  iter  —  1,  %\  =  1,  <2  =  2 

step  1     apply  one  iteration  of  the  fastpass  algorithm;  iter  =  iter  +  1 

step  2     forward(t\,t2) 

step  3     if  <2  =  71  then 

i<er  =  iter  +  1; 

if  A\  (xi ,  u>i )  <  (o/er  •  min(|l/|,  \L\)  then  stop:  optimal  solution  at  hand 

step  4     if  £*,,_, Z>«j-i(z7a-7,>u'«J-1)  ^  t^T  -  wii«(|i/|,  |^j)  and  i,  <  T  -  1  then 

«i  =  fe  +  1;  <2  =  U 
else 

backward(t-2  —  1,1); 

ti  =  1;  <2  =  2 
step  5     go  to  step  2 
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Computational  Results 

Table  7  restates  the  base  case  (fasipass)  strategy's  results  for  convenient  reference.  Tables  8  and  9 
use  "x  increase"  with  respect  to  fasipass  as  the  performance  measure;  if  this  ratio  is  less  than  1.0, 
the  strategy  outperforms  fasipass  on  that  particular  problem.  By  examining  the  #  Subs  columns  in 
Tables  7  and  8  one  can  contrast  the  distribution  of  computing  effort  per  stage  for  three  strategies. 


Name 

fasipass 

CPU  (sec.) 

iter. 

#Subs 

Moke3.9 

55.4 

10 

10-59-99 

Ybsf3.9 

119.8 

7 

7-47-72 

Moke4.45 

221.4 

10 

10-64-274-459 

Ybsf4.45 

404.0 

8 

8-83-205-369 

Table  7:  Base  Case  Strategy  -  fasipass 


Name 

cautious 

shuffle 

x  increase 

iter. 

#  Subs 

x  increase 

iter. 

#  Subs 

Moke3.9 

0.92 

7 

19-82-72 

1.08 

7 

7-51-135 

Ybsf3.9 

1.03 

6 

21-86-63 

1.21 

6 

6-45-117 

Moke4.45 

1.06 

8 

27-140-449-369 

2.10 

4 

4-40-296-852 

Ybsf4.45 

1.00 

6 

29-202-274-279 

2.18 

8 

8-71-263-771 

Table  8:  cauiious  and  shuffle 

In  Table  9  the  "x  increase"  column  is  the  average  of  the  CPU  ratios  for  the  four  test  problems.  This 
value  is  displayed  for  some  specific  values  of  e  in  the  (-cautious  and  t-shuffle  strategies. 


(.-cautious 

e- 

shuffle 

e 

x  increase 

e 

x  increase 

0.0001 

1.00 

0.0001 

1.64 

0.0004 

0.98 

0.0004 

1.52 

0.0016 

0.97 

0.0016 

1.32 

0.0064 

0.95 

0.0064 

1.16 

0.0256 

1.06 

0.0256 

1.08 

0.1024 

1.01 

0.1024 

0.96 

0.4096 

1.01 

0.4096 

1.00 

Table  9:  e-cauiious  and  (-shuffle 

The  computational  results  indicate  the  fasipass,  cautious,  and  e-cautwus  strategies  perform  com- 
parably and  outperform  shuffle  and  e-shuffle.  It  is  only  as  e  becomes  large  (and  hence  (-shuffle 
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approaches  fastpass)  that  (.-shuffle  performs  comparably.  It  is  interesting  to  note  how  little  the 
(-cautious  running  times  change  as  (  varies.  An  (-cautious  strategy  seems  appealing  in  a  multistage 
problem  (especially  with  discounting)  where  there  is  a  desire  to  avoid  the  computationally  expensive 
later  stages.  In  these  problems,  however,  there  was  not  a  significant  difference  between  the  three  and 
four  stage  problems  for  the  cautious  (also  (-cautious)  strategies.  On  the  other  hand,  the  shuffle  (also 
(-shuffle)  strategy's  performance  is  significantly  worse  on  the  four  stage  problems.  As  one  might 
expect,  the  "extreme"  strategies'  performance  deteriorates  when  the  advanced  start  procedure  is 
removed;  the  average  CPU  ratios  of  cautious  and  shuffle  to  fastpass  in  this  case  are  1.13  and  2.07, 
respectively. 

5      Direct  LP  Optimizers 

The  performance  of  the  enhanced  decomposition  algorithm  (i.e.,  the  base  case)  and  general  LP 
optimizers  on  an  enlarged  set  of  test  problems  is  summarized  in  Table  10.  These  eight  problems  are 
based  on  the  Mokelumne  and  Yuba-Bear-South  Feather  river  basin  models  with  1,  9,  27,  and  45 
scenarios.  We  use  the  "x  increase"  measure  with  respect  to  the  base  case  for  three  other  LP  solution 
strategies:  (i)  the  primal-dual  predictor-corrector  interior  point  algorithm  as  implemented  in  IBM's 
OSL  Release  2;  (ii)  this  same  interior  point  algorithm  followed  by  the  simplex  method  to  generate  an 
extreme  point  solution;  (iii)  the  primal  simplex  method  as  implemented  in  CPLEX  2.0.  The  results 
show  that  on  single  scenario  problems,  the  decomposition  algorithm  is  outperformed  by  general  LP 
optimizers,  but  as  the  number  of  scenarios  grows,  the  decomposition  algorithm  is  preferable. 

The  decomposition  algorithm  terminates  when  the  relative  error  is  less  than  10-4.  The  same 
duality  gap  tolerance  was  used  in  both  interior  point  solution  strategies;  all  tests  were  performed 
on  a  Hewlett  Packard  9000/750  workstation.  Due  to  memory  limitations  (OSL's  dspace  array  was 
allocated  64  Mb)  we  were  unable  to  solve  the  largest  (Ybsf4.45)  problem  via  the  interior  point 
strategies.  Recall  (§2)  that  the  stochastic  hydro  scheduling  problems  are  subproblems  in  a  larger 
nonlinear  Dantzig-Wolfe  decomposition  algorithm.  Thus  extreme  solutions  are  desirable,  and  this  is 
why  the  corresponding  time  to  generate  an  optimal  extreme  point  solution  (via  the  simplex  method) 
from  the  interior  point  solution  is  shown  in  Table  10. 
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Name 

Base 
Case 

Int.  Pt. 

Int.  Pt. 
— *  Simplex 

Simplex 

(sec.) 

x  increase 

Moke4.1 

16.8 

0.32 

0.46 

0.40 

Moke4.9 

85.0 

1.00 

2.41 

6.31 

Moke4.27 

132.9 

2.26 

6.94 

27.54 

Moke4.45 

221.4 

3.68 

11.37 

47.18 

Ybsf4.1 

48.1 

0.23 

0.42 

0.59 

Ybsf4.9 

177.3 

1.12 

2.94 

16.04 

Ybsf4.27 

252.9 

2.15 

14.84 

61.53 

Ybsf4.45 

404.0 

N/A 

N/A 

100.20 

Table  10:  Comparison  with  Direct  LP  Optimizers 

6     Summary 

We  have  described  a  number  of  enhancements  to  the  nested  Benders  decomposition  algorithm  for 
multistage  stochastic  linear  programming.  The  enhanced  algorithm  is  a  small,  but  important  part 
of  an  ongoing  research  and  development  project  at  The  Pacific  Gas  and  Electric  Company;  see 
Jacobs  et  al.  [10]  for  a  more  detailed  description  of  the  project.  The  performance  of  the  algo- 
rithm was  examined  on  a  collection  of  multistage  stochastic  hydroelectric  scheduling  problems.  The 
computational  results  indicated  the  single  most  important  enhancement  is  a  warm  start  technique 
which  utilizes  optimal  basis  information  from  previous  subproblem  solutions.  We  also  described  a 
streamlined  advanced  start  procedure  that  generates  preliminary  cuts  to  help  guide  the  early  iter- 
ations of  the  decomposition  algorithm;  this  enhancement  provided  additional  speedup  over  naive 
implementations.  We  found  the  multicut  method  due  to  Birge  and  Louveaux  [4]  also  yielded  com- 
putational savings  over  its  single  cut  counterpart.  Consistent  with  earlier  findings  of  Abrahamson 
[1]  and  Wittrock  [15]  (in  the  deterministic  case)  and  Gassmann  [7]  (in  the  stochastic  case)  we  found 
that  the  fastpass  tree  traversing  strategy  performed  well.  However,  a  new  class  of  (-cautious  tree 
traversing  strategies  produced  comparable  results  to  fastpass;  further  investigation  of  this  class  of 
strategies  may  be  warranted  for  problems  with  many  stages.  Finally,  we  showed  that  taking  advan- 
tage of  the  problem's  special  structure  through  the  enhanced  decomposition  algorithm  provides  a 
computationally  attractive  alternative  to  direct  LP  optimizers  on  medium  to  large-size  problems. 
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